Unicode and HTML について

Words near each other

・ Unicity (disambiguation)
・ Unicity distance
・ Unicity International
・ Unicity Mall
・ Uniclass
・ Uniclinic Atlético Clube
・ UniCluster
・ UNICO
・ Unico
・ Unico (disambiguation)
・ Unico Banking Group
・ Unico National
・ Unico Wilhelm van Wassenaer
・ Unicode
・ Unicode and email
・ Unicode and HTML
・ Unicode and HTML for the Hebrew alphabet
・ Unicode anomaly
・ Unicode block
・ Unicode character property
・ Unicode collation algorithm
・ Unicode compatibility characters
・ Unicode Consortium
・ Unicode control characters
・ Unicode equivalence
・ Unicode font
・ Unicode in Microsoft Windows
・ Unicode input
・ Unicode subscripts and superscripts
・ Unicode symbols

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Unicode and HTML ：ウィキペディア英語版

Unicode and HTML

Web pages authored using hypertext markup language (HTML) may contain multilingual text represented with the Unicode universal character set. Key to the relationship between Unicode and HTML is the relationship between the "document character set" which defines the set of characters that may be present in a HTML document and assigns numbers to them and the "external character encoding" or "charset" used to encode a given document as a sequence of bytes.
In RFC 1866, the initial HTML 2.0 standard, the document character set was defined as ISO-8859-1. It was extended to ISO 10646 (which is basically equivalent to Unicode) by RFC 2070. It does not vary between documents of different languages or created on different platforms. The external character encoding is chosen by the author of the document (or the software the author uses to create the document) and determines how the bytes used to store and/or transmit the document map to characters from the document character set. Characters not present in the chosen external character encoding may be represented by character entity references.
The relationship between Unicode and HTML tends to be a difficult topic for many computer professionals, document authors, and web users alike. The accurate representation of text in web pages from different natural languages and writing systems is complicated by the details of character encoding, markup language syntax, font, and varying levels of support by web browsers.
== HTML document characters ==
Web pages are typically HTML or XHTML documents. Both types of documents consist, at a fundamental level, of characters, which are graphemes and grapheme-like units, independent of how they manifest in computer storage systems and networks.
An HTML document is a sequence of Unicode characters. More specifically, HTML 4.0 documents are required to consist of characters in the HTML ''document character set'' : a character repertoire wherein each character is assigned a unique, non-negative integer ''code point''. This set is defined in the HTML 4.0 DTD, which also establishes the syntax (allowable sequences of characters) that can produce a valid HTML document. The HTML document character set for HTML 4.0 consists of most, but not all, of the characters jointly defined by Unicode and ISO/IEC 10646: the Universal Character Set (UCS).
Like HTML documents, an XHTML document is a sequence of Unicode characters. However, an XHTML document is an XML document, which, while not having an explicit "document character" layer of abstraction, nevertheless relies upon a similar definition of permissible characters that cover most, but not all, of the Unicode/UCS character definitions. The sets used by HTML and XHTML/XML are slightly different, but these differences have little effect on the average document author.
Regardless of whether the document is HTML or XHTML, when stored on a file system or transmitted over a network, the document's characters are ''encoded'' as a sequence of bit octets (''bytes'') according to a particular character encoding. This encoding may either be a Unicode Transformation Format, like UTF-8, that can directly encode any Unicode character, or a legacy encoding, like Windows-1252, that cannot. However, even when using encodings that do not support all Unicode characters, the encoded document may make use of numeric character references. For example, ☺ ((unicode:☺)) is used to indicate a smiling face character in the Unicode character set.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Unicode and HTML」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース